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Abstract: 

Most  important  journal  papers  in  magnetics  are  selected  from  conference  records  with  quick  review 
and  subject  to  stringent  page  limits.  The  literature  as  a  result  is  unsatisfactory,  inadequately 
attributing  previous  works  and  without  sufficient  details  to  replicate  work  presented.  This  paper 
therefore  reviews  mathematical  optimization  in  synthesis  and  nondestructive  evaluation  (NDE)  by 
the  finite  element  method  in  magnetics.  The  review  identifies  the  earliest  papers.  Thereafter  this 
paper  proposes  and  establishes  the  feasibility  of  coupled  problem  optimization  using  the  genetic 
algorithm  to  avoid  mesh  induced  minima  which  hurt  gradient  based  methods.  The  genetic  algorithm, 
while  avoiding  the  need  for  derivatives,  results  in  having  to  undertake  even  more  numerous  finite 
element  solutions.  Although  the  genetic  algorithm  has  been  applied  in  optimization,  in  coupled 
systems  the  number  of  object  function  evaluations  doubles.  We  there  examine  the  use  of  graphics 
processing  units  (GPUs)  to  handle  the  immense  computational  load.  GPUs  have  recently  been 
introduced  in  finite  element  analysis  but  their  memory  limits  are  often  not  recognized  and  are 
critically  limiting  when  parallelizing  the  several  solutions  required  in  optimization.  To  overcome  this 
limit,  element-by-element  finite  element  matrix  processing  is  employed,  making  coupled  problems 
practicable  on  GPUs.  We  overcome  the  memory  limits  faced  by  others. 


1.  Inverse  Problems  for  Design 
-  A  Review 

The  direct  problem  with  which 
the  finite  element  method 
started  [1-3]  has  a  device 

governed  by  a  particular  differential  equation,  say  the  Poisson  equation 

—eV2cp  =  p 


Figure  1:  The  Typical  Forward  Problem:  Analysis 


(1) 


as  was  solved  by  the  finite  element  method  by  Zienkiewicz  and  Cheung  in  their  classic  paper  [3]. 
Once  we  have  cp  -  which  may  be  electric  potential,  pressure,  magnetic  vector  pressure  etc. 
depending  on  the  system  -  we  may  compute  performance  descriptions  like  inductance,  force,  etc. 
(Fig.  1).  That  is,  from  the  system  description,  we  compute  performance.  This  is  analysis. 


The  inverse  problem  -  the  more  practically  realistic  problem,  which  is  synthesis  -  goes  from  the  right 
hand  side  of  Fig.  1  to  its  left.  That  is,  wanting  a  performance,  computing  the  system  description  from 
it.  Thus  the  computational  design  assignment  may  be  this:  compute  the  size  and  other  descriptions 
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of  a  motor  that  can  produce  so  much  torque.  In  industry  this  was  done  by  the  cycle  of  design-make- 
test-redesign.  This  required  an  expert  to  redesign  and  took  long.  In  time  by  the  1970s  instead  of 
making  and  testing,  we  analyzed,  solving  the  direct  problem  by  the  finite  element  method. 
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Figure  2:  Pole-Faces  to  be  Shaped 

It  was  left  to  engineers  dealing  with  stress 
analysis  and  fluids  to  couple  optimization 
with  the  finite  element  method  [5-7],  and  the 
second  half  of  the  1970s  and  1980s  would  be 
the  time  for  true  synthesis  -  solving  for 
geometric  shape  and  material  values  from 

design  criteria.  The  earliest  persons  to  FiSurc  3:  Minimal  Problem  usinS  Symmetry 

automate  this  cycle  in  magnetics  were  Marrocco  and  Pironneau  in  1978  [6].  They  attempted  to 
optimize  the  shape  of  the  magnetic  pole  of  a  recording  head  so  that  the  fringing  effect  at  the  edges 
of  a  pole  could  be  countered  so  as  to  realize  the  object  of  constant  flux  density  B  in  a  the  recording 
head.  (Fig.  2  gives  a  similar  problem  with  a  repeating  alternating  pole  system  from  electric 
machinery  where  a  constant  flux  density  is  required  on  top  of  each  pole  to  facilitate  alternating 
waveform  generation.  The  minimal  boundary  value  problem  for  analysis  is  shown  in  Fig.  3). 


Marrocco  and  Pironneau  [6]  located  their  work  in  the  latter's  1976  doctoral  thesis  at  Universite 
Pierre-et-Marie-Curie  (also  known  as  UPMC  and  Paris  VI),  optimizing  structural  and  fluid  systems. 
That  is,  their  work  may  be  seen  as  parallel  to  the  1976  work  of  Arora  and  Hang  [7]  who  established 
finite  element  optimization  in  a  journal.  They  approached  this  problem  by  defining  an  object 
function  F  consisting  of  the  square  of  the  difference  between  the  computed  and  desired  flux 
densities.  Thus  the  problem  is  one  of  optimizing  -  i.e.,  minimizing  -  F  which  is  a  function  of 
parameters  plfp2,  •••  defining  the  geometry  and  which  are  computed  so  as  to  minimize  F. 


Figure  4:  Jagged  Pole  Face  of  Right  Half  of  Recording  Head 


Their  results  are  shown  in  Fig.  4.  The  nonsmooth  jagged  contour  in  Fig.  4b  that  they  realized  is 
practically  not  a  manufacturable  shape.  This  they  addressed  by  smoothening  the  pole  face  as  in  Fig. 
4c.  However,  Marrocco  and  Pironneau,  aware  of  the  problem  of  jagged  contours,  have  further 
addressed  it  to  permit  smoothening  by  allowing  nodes  to  move  only  along  prescribed  paths. 
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Although  the  comprehensive  1984  book  by  Pironneau  on  optimization  [8]  deals  with  the  theory  of 
imposing  constraints  and  applies  them  to  many  systems,  the  results  presented  from  magnetics  are 
the  same  as  presented  by  Marrocco  and  Pironneau  much  earlier  without  constraints.  Pironneau,  a 
widely  experienced  pioneer  scientist  in  optimization  and  finite  element  analysis,  particularly  in  fluids, 
was  not  focused  on  magnetics.  He  with  Marrocco  solved  this  problem  to  demonstrate  the  broad 
applicability  of  their  methods  of  finite  element  optimization  and  then  moved  on.  The  line  of  work 
broached  by  Pironneau  would  remain  untapped  for  a  while  until  another  French  group  would  pick  it 
up  using  computational  optimization  just  coming  to  the  fore  then  [9]. 

In  magnetics  the  Ecole  Nationale  Superieure  d'lngenieurs  Electriciens  de  Grenoble  (ENSIEG)  group 
led  by  J.C.  Sabonnadiere,  J.L.  Coulomb  and  G.  Meunier  would  bring  mathematical  optimization  to 
bear  on  finite  element  analysis  design  in  1989  [9].  The  ENSIEG  group,  however,  ignored  the  1978 
magnetics  paper  by  Marrocco  and  Pironneau  [6]  and  sought  inspiration  from  the  seminal  1976 
structural  optimization  paper  by  Arora  and  Hang  [7]. 


This  early  work  flowing  from  the  ENSIEG  group  used  gradients  based  methods,  steepest  descent  in 
particular  [5].  Here  the  change  in  parameters  of  device  description  {p}  is  against  the  gradient  of  the 
object  function  F  because  in  one-dimensional  analogy  the  minimum  point  is  to  the  right  of  locations 
with  negative  gradient  and  to  the  left  of  those  with  positive  gradient: 

(2) 


dF 


where  the  amount  of  change  a  is  determined  by  a  line  search  [5].  The  computation  of  the  gradient 
VF  (i.  e.,  3F/3{p})  was  previously  by  finite  difference,  computing  F  through  a  finite  element 
solution  corresponding  to  a  given  {p}  and  then  in  turn  changing  each  component  pt  by  an 
infinitesimal  amount  and  re-computing  F  to  get  dF/dpi  ~  SF/Spi.  Thus  the  component  of  VF  at 
each  iterative  step  with  n  components  of  {p}  took  n+1  finite  element  solutions  and  then  once  the 
direction  of  change  of  {p},-VF,  is  established  several  more  finite  element  solutions  were  need  to  be 
sought  during  the  line  search  as  a  in  (2)  is  progressively  increased  and  the  problem  iteratively  solved 
until  the  minimum  of  F  in  that  direction  is  identified  [5].  Each  changed  {p}  means  a  new  geometry 

and  therefore  a  new  mesh.  For  a  seamless  iterative 
process,  automatic  mesh  generators  are  required 
that  can  yield  a  mesh  corresponding  to  a  given  {p}. 

The  finite  difference  computation  of  the  gradient  of 
F  had  been  known  to  be  notoriously  inaccurate  from 
force  computations  by  the  virtual  work  principle 
[10].  However  Coulomb  [11-13]  of  the  ENSIEG  group 
identified  a  one-step  solution  for  the  computation  of 
VFfrom  the  finite  element  solution  without  resort  to 
a  second  solution  for  determining  the  change  in  F 
with  change  in  each  component  of  {p}.  This 
discovery  occurred  while  working  with  the  virtual  work  principle  for  force  computation  in  magnetics 
where  the  force  F  in  the  direction  s  is  computed  by  differentiating  the  stored  magnetostatic  energy 
1 lm :  F  =  — 3T lm/ ds.  The  finite  element  approximation  of  the  Poisson  equation  (1)  leads  to  the 
matrix  equation 
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[P]{cp]  =  {' Q }  (3) 

where  the  Dirichlet  matrix  [P]  is  usually  evaluated  numerically  from  element  coordinates  [14,  15]. 
Prior  to  numerical  evaluation  however,  [P]  is  an  explicit  function  of  the  nodal  coordinates  and 
therefore  differentiable  explicitly.  That  is,  d[P]/ds  for  force  computation  or  d[P]/dpi  in  designing 
parameters  pt  may  be  numerically  evaluated  after  differentiating  the  explicit  form  of  [P]  as  a 
symbolic  expression  in  terms  of  element  coordinates  after  mapping  nodal  coordinates  to  s  or  pt  as 
appropriate  [15,  16].  d{(p}/dpi  may  thereupon  be  computed  from,  upon  differentiating  (3): 

L  J  dVi  dVi  dVi  v  1 


Although  the  Incomplete  Cholesky  Conjugate  Gradients  (ICCG)  method  is  usually  the  preferred 
method  of  solving  matrix  equations  with  sparse  symmetric  positive  definite  coefficient  matrices  as 
from  the  finite  element  method,  in  this  particular  case  it  is  far  more  efficient  to  use  the  Cholesky 
factorization  method  as  seen  in  Fig.  5  [14,  17].  In  Cholesky's  method  most  of  the  work  is  in 
decomposing  [P]  into  its  lower  and  upper  triangular  Cholesky  factors  [L]  and  [U]  after  which  all  that 
is  left  to  do  is  the  quick  forward  elimination  and  back-substitution.  In  this  instance,  since  equation 
(3)  for  {cp}  and  the  many  equations  (4)  for  d{(p}/dpi  share  the  same  coefficient  matrix  [P],  once  the 
Cholesky  factors  are  computed  in  solving  (3)  for  {cp},  they  may  be  used  to  solve  (4)  for  d{cp}/dpi 
with  trivial  extra  work  [17].  That  is  why  in  Fig.  5  as  the  number  of  p's  goes  up,  the  solution  time  for 

Cholesky's  scheme  remains  practically  flat,  taking  up  time  only  for  the 
forward  elimination  and  back-substitution. 

This  seminal  ENSIEG  work  on  force  computation  through 
differentiation  [11-13]  led  by  1989  to  the  next  logical  step  of  applying 
derivative  information  for  gradients  based  optimization  [9,  18]. 
Although  the  Grenoble  group  too  worked  with  constraints,  perhaps 
to  ensure  basic  requirements  like  lengths  not  going  negative,  they  did 
not  apply  constraints  to  ensure  the  smoothness  of  shapes.  For 
example,  Fig.  6  from  [9]  presents  a  pole  shape  they  synthesized  for  a 
linear  change  in  the  horizontal  direction  of  the  vertical  flux  density, 
where  the  jagged  contour  is  evident. 


2.  Other  Early  Papers  on  Optimization  in  Magnetics 

This  seminal  Grenoble  paper  in  1989  [9]  following  Marrocco  and  Pironneau's  original  work  [6],  both 
from  France,  opened  up  the  subject  of  optimization  in  magnetics.  This  new  subject  innovation  had 
two  aspects:  one  -  the  key  -  the  idea  of  the  inverse  problem  posed  through  the  minimization  of  an 
object  function,  and,  two,  the  method  of  optimization  to  find  the  minimum  of  that  object  function. 


Each  magnetics  paper  that  followed,  for  the  most  part  added  a  new  aspect  in  terms  of  method  of 
optimization  while  failing  to  acknowledge  the  preceding  foundational  work  on  posing  the  inverse 
problem  through  object  functions.  Unfortunately  the  IEEE  Transactions  on  Magnetics,  Journal  of 
Applied  Physics  and  a  few  others  with  their  policy  of  quick  turn-around  through  selecting  most  of 
their  papers  from  conferences,  had  a  flurry  of  papers  that  did  not  reference  these  seminal  works  by 
Marroco  and  Pironneau  [6]  or  the  ENSIEG  group  [9].  Because  of  the  route  and  methods  of  paper 
selection  described  in  detail  in  section  4,  these  indexed-journal  papers  often  applied  a  new  method 
of  mathematical  optimization  and  effectively  purported  to  bring  inverse  problem  solution  into 
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magnetics;  whereas  they  were  really  bringing  into  magnetics  new  ways  of  minimizing  the  object 
function  introduced  to  magnetics  by  Marroco  and  Pironneau,  and  ENSIEG.  The  page  limit  with 
conference  issues  of  journals  did  not  permit  space  for  replicability  so  that  these  papers  failed  to 
allow  others  to  repeat  their  work  easily  and  build  upon  it;  the  journals  were  focused  on  results 
rather  than  method  because  of  such  stringent  page  limits. 

The  next  two  papers  on  finite  element  optimization  in  magnetics  after  the  ENSIEG  paper  came  early 
in  1990,  from  Germany,  both  selected  conference  papers  carried  in  the  same  issue  of  the  IEEE 
Transactions  on  Magnetics  [19,  20].  Russenschuck's  newness  [19]  was  in  the  methods  of 
optimization,  for  example  on  the  Rosenbrock  and  SUMT  optimization  algorithms  for  searching  for 
the  object  function's  minimum  by  gradient  methods  [5].  No  new  information  was  offered  on  how 
the  gradient  VF  of  the  object  function  was  calculated.  Seminal  sources  on  gradient  computation 
were  not  referenced.  Neither  did  they  acknowledge  the  French  works.  Nor  did  Schafer-Jotter  and 
Muller  [20]  who  gave  more  details  and  introduced  the  zeroth  order,  statistical-based  simulated 
annealing  method  to  magnetics;  however  they  do  not  seem  to  have  recognized  the  advantages  of 
not  having  to  compute  gradients. 

Likewise,  a  few  months  later  in  1990,  an  Austrian  group,  also  ignoring  the  French  work,  did  not 
mention  the  problem  of  jagged  shapes  but  avoided  the  problem  by  modeling  shapes  by  fourth,  sixth 
and  eighth  order  polynomials  and  optimizing  for  the  polynomial  coefficients  [21].  However,  it  was 
puzzling  because  it  is  well  known  that  high  order  polynomials  in  modeling  the  curve  of  Reluctivity  v 
Versus  B 2  for  steels  yield  undulating  (squiggly)  shapes  and  this  problem  is  avoided  by,  just  as  in  finite 
element  shape  functions,  several  short  shapes  of  low  order  (cubics  at  most)  with  the  values  and 
slopes  matched  at  the  boundaries  [22].  Presumably  the  results  they  presented  are  for  third  order 
polynomials  because  they  lack  the  undulations  to  be  expected.  For  when  we  tried  high  order  models 
we  did  get  highly  undulating  shape  profiles.  However,  that  paper  usefully  introduced  to  magnetics 
the  evolution  strategy  (a  variant  of  the  genetic  algorithm  brought  into  magnetics  by  a  Korean  group 
under  Hahn  to  optimize  a  coil  gun  [23].  The  solution  there  was  not  by  finite  elements  but  by  circuit 
models). 

A  Japanese  group  under  Nakata  [24],  also  as  early  as  1991  and  not  referencing  the  French  works  but 
the  later  work  of  Schafer-Jotter  and  Muller  [20],  optimized  a  magnetic  circuit  by  minimizing  a  least- 
square  object  function  by  Rosenbrock's  search  method  [5].  An  early  Italian  paper  from  Jan.  1992  [25] 
had  very  powerful  results  in  3-D  but  offered  little  information  in  terms  of  method  or  the  object 
function  except  to  say  they  used  pattern  search  with  constraints.  If  they  had  given  more  details  it 
would  have  been  seminal  for  having  branched  into  optimization  in  3-D  magnetics. 

A  British  group  [26]  brought  in  simulated  annealing  without  mentioning  the  seminal  French  work  nor 
the  work  with  simulated  annealing  of  Schafer-Jotter  and  Muller  [20].  Their  geometric  changes 
involved  no  continuous  change  of  shape  but  square  bits  that  were  switched  on  and  off  as  finite 
element  optimization  in  magnetics  proceeded  to  grow  as  a  field.  A  Korean  group  did  not  mention 
constraints  but  came  up  with  results  of  smooth  shapes  using  the  steepest  descent  method  [27]. 
Other  papers  introduced  parallel  computation  on  shared  memory  systems  for  finite  element 
optimization  [15,  17,  28]. 
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3.  Constraints,  Specialized  Meshes 

Although  several  papers  had  been  presented  on  gradients  based  optimization,  none  had  mentioned 
the  problem  of  jagged  contours  and  that  of  artificial  mesh-induced  local  minima  that  are  not  intrinsic 
to  the  physical  problem.  As  noted,  the  jagged  contour  results  were  smoothened  by  Marrocco  and 
Pironneau  [6,  8]  and  the  ENSIEG  group  paid  no  attention  to  the  jagged  contours  of  Fig.  6.  Others  had 
used  polynomial  fits  of  surfaces  thereby  avoiding  the  problem  [21]. 


No  one  among  the  authors  of  the  preceding 
papers  at  the  time  had  recognized  the 
problem  of  mesh  induced  local  minima  in 
object  functions.  As  a  geometry  changes 
and  an  element  violates  the  Delaunay 
criterion  for  meshes  [14],  the  nodal 
connections  are  changed  by  the  mesh 
generator  thereby  making  the  evaluated 
object  function  undergo  a  C1  discontinuity.  The  local  minima  encountered  by  researchers  (according 
to  numerous  personal  communications  at  the  time  such  as  with  the  authors  of  [25])  were  thought  to 
be  systemic  (i.e.,  inherent  to  the  problem  being  solved)  and  bypassed  by  restarting  the  iterations 
from  another  point. 

Recognizing  this  problem  Subramaniam  et  al.  [29]  suggest  a  tunneling  function  and  an  assortment  of 
algorithms  so  that  when  one  fails  another  may  be  started  to  identify  the  minimum.  However,  the 
real  cause  of  the  local  minima  in  the  object  functions,  would  soon  be  identified  as  owing  to  changes 
in  nodal  connections  as  the  shape  evolved  under  synthesis  [30,  31].  Two  solutions  were  offered: 
either  keep  the  nodal  connections  fixed  so  that  as  geometric  changes  occur  the  meshes  are 
elastically  pulled  or  crunched,  or,  alternatively,  use  a  zeroth  order  search  method  so  that  the  local 
mesh-induced  minima  are  not  a  problem.  In  light  of  this  knowledge,  when  Marrocco  and  Pironneau's 
paper  [6]  was  examined,  it  turned  out  that  they  had,  working  in  an  era  where  there  were  no 
automatic  mesh  generators,  used  an  elastically  distorted  mesh  with  the  connections  not  changing  as 
shown  in  Fig.  7.  This  had  given  them  smooth  object  functions  without  their  recognition  that  the 
specific  mesh  changes  they  used  had  made  them  avoid  the  attendant  problem  of  mesh-induced  local 
minima.  A  special  mesh  generator  elastically  to  deform  the  mesh  while  keeping  nodal  connections 
fixed  was  created  by  Krishnakumar  for  his  doctoral  thesis  [32]. 

To  address  jagged  contours  a  few  successful  approaches  have  been  proposed  and  employed. 
Explaining  using  the  pole  face  described  by  n  parameter  heights  plt  p2, ... ,  pn  as  shown  in  Fig.  8,  the 
object  function  is  defined  by 

F{{V})  =  \Y.U{BJ  - 1)2  (5) 

where  Bj  is  the  vertical  component  of  flux  density  at  the  measuring  points  i  which  are  also  shown  in 
Fig.  6.  B? at  the  9  measuring  points  would  be  forced  to  the  value  1  T  when  F  goes  down  to  its  lowest 
value  0.  Those  were  times  -  the  1980s  -  when  shape  optimization  was  just  taking  off.  The 
mathematical  optimization  formulation  used  by  Pironneau  in  his  book  was  complete.  He 
implemented 


Minimize  F  =  F({p} 


(6a) 


UNCLASSIFIED 


subject  to  the  equality  constraints 

Htttp})  =  0  i  —  1,2,  ...k 


(6b) 


and  inequality  constraints 

G*({p})<  0  i  =  1,2, ...  I  (6c) 

Inequality  constraints  of  the  type  /;{p)  >  0  may  be  recast  as  in  (6c):  G;({p})  =  — 7i({p})  <  0.  Those 
of  the  form  a  <  /;{p}  <  b  as  Gi  =  (Jt  —  a)(Ji  —  b)  <  0.  This  form  with  upper  and  lower  limits  is 
often  normalized  as  Gt  =  (/*  —  a)(Ji  —  b)/(b  —  a)  <  0). 


Figure  8:  Parameterized  Pole  Face  Description  Vector  {p} 


a)  Free  b)  Constrained  Slope  Change  c)  Strucural  Mapping  d)  Diminishing  Heights 


Figure  9:  Smoothening  Shape  by  Constraints 


As  already  mentioned  Marrocco  and  Pironneau  [6]  did  not  use  constraints  in  magnetics  in  1978  or 
1984  to  enforce  the  smoothness  of  the  pole-face.  Nor  did  the  ENSIEG  group  [9].  With  constraints  it 
has  been  shown  that  jagged  contours  result  as  in  Fig.  4b  and  Fig.  9a.  But  with  constraints  forcing  the 
straight  line  segments  from  point  i-1  to  point  i  to  be  within  say  a  certain  angle  of  the  slope  from 
point  i  to  point  i+1, 


\Pi-i-Pi  Pi~Pi+ 1| 


<  6 


\Xi-l-Xi  Xi-Xi+l 

shaped  surfaces  would  be  of  the  form  seen  in  Fig.  9b  [33].  The  result  can  be  relatively  smooth  but 

undulating  and  could  be 


(7) 


objectionable  to  those  called  to 
manufacture  a  recording  head  like 
this,  although  it  meets  the  design 
object  perfectly.  Using  the  multiple 
solutions  there  exist  for  this 
problem,  another  approach  regards 
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the  recording  head  artificially  as 
made  up  of  a  rubber  like  substance 
so  that  if  pressed  down  by  a  pin¬ 
point  force  it  would  deform 
smoothly  as  in  Figs.  9d  and  10  [31]. 
Thus  the  question  is  recast  as  "What 
is  the  set  of  pin-point  forces  Fpp 
which  when  applied  to  the  surface, 
deform  it  to  a  form  that  would  yield 
a  uniform  flux  density  -  i.e,  minimize 
the  object  function  F.  The  artificially 
assigned  mechanical  properties  of 
as  shown  in  Fig.  11.  Thus  it  is  a  two 
part  problem  whose  importance  is  in  laying  the  mathematical  groundwork  for  coupled  field 
problems  [34].  A  structural  problem  solves  for  {pjfrom  pin-point  forces  { Fpp } 

[S]{p]  =  {FPP}  (8) 

and  the  magnetics  problem  gives  the  magnetic  vector  potential  {71}  sourced  by  current  density  J: 

[P]{A]  =  {]}  (9) 

Since  optimization  requires  gradients  with  respect  to  the  parameters  of  design,  in  this  case  the  pin¬ 
point  forces,  we  need 


c .  . .  TT.  .  density  to  achieve  sinusoidal  flux 

a.  Starting  Design  - 

Figure  11:  Structurally  Deformed  Alternator  Optimization 


the  material  ensure  a  smooth  profile  of  the  shaped  geometry 


+ 


dF  _  dF 

dFpp  dFj 

dF/dFpp  accounts  for  explicit  appearances  of  ft 


dF  d{A } 


(10) 


lpp  . w.  w.  , pp  i n  F  and  is  nearly  always  zero.  dF/dA  is  easily 

obtained  because  the  object  function  is  defined  in  terms  of  the  flux  density  B  and  the  curl  of  A  is  B. 
We  now  compute  d{A}/dFpp,  where  {A}  and  Fpp  are  in  the  electrical  and  structural  systems,  from 

d{A]  _  d{A }  d{p } 


dFpp  d{p }  dFpp 


(ID 


The  two  derivatives  on  the  right  hand  side  are  obtained  by  differentiating  and  solving  the  equations 


(8)  and  (9)  for  the  derivatives.  Alternatively  we  can  work  without  forces  but  some  part  of  {p} 
replaced  by  forced  displacements.  Fig.  11  from  Weeber's  doctoral  work  [31,  35]  shows  the  designed 


alternator  to  yield  a  sinusoidal  stator  flux  distribution. 


A  third  option  is  to  work  with  constraints  on  the  heights  p^numbered  from  the  left  to  right  of  the 
form 

Pi  >  p2  >  p3  >  -Pn  (12) 

This  will  yield  a  solution  as  in  Fig.  9d,  again  giving  an  object  function  going  down  to  0  because  of  the 
multiplicity  of  solutions  possible. 


Because  this  problem  of  shaping  a  pole  to  yield  a  constant  flux  density  has  numerous  solutions  it  has 
now  come  to  be  a  standard  benchmark  problem  for  testing  software  purporting  to  optimize 
geometries  [36]. 


Weeber  and  Hoole  [37],  recognizing  that  much  of  the  work  in  magnetics  is  reinventing  the  wheel 
already  invented  under  structural  analysis  and  that  there  was  a  wealth  of  information  already  in  that 
literature  unknown  to  the  magnetics  community  in  electrical  engineering,  brought  several  methods 
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from  structural  and  civil  engineering  into  electromagnetics.  Thus  the  important  subregion  method 
was  brought  into  magnetics  and  the  part  where  the  shape  is  being  optimized  made  a  subregion  so 
that  computations  were  greatly  simplified  [38]. 

4.  Quality  of  Magnetics  Papers  and  Special  Via-Conference  Journal  Issues 

Papers  on  numerical  methods  are  difficult  to  evaluate.  Unlike  with  old  explicit  solutions  where  to 
accept  an  argument  one  simply  follows  the  authors'  derivations,  with  numerical  methods  a  reviewer 
for  proper  assessment  needs  to  program  the  method  again  repeating  work  which  might  have  been 
done  over  a  doctoral  thesis.  This  cannot  practicably  be  done. 

Recognizing  this,  thanks  to  several  conferences  on  magnetics  having  arrangements  for  selected 
papers  to  be  quickly  carried  by  journals,  many  important  results  have  seen  the  light  of  day  instead  of 
having  to  languish  for  long  periods  for  review.  But  as  a  result,  as  we  have  seen  in  the  previous  review 
of  papers  in  this  important  area  of  magnetics,  past  work  was  not  properly  attributed  and  results 
needing  a  lot  more  development  were  published  without  providing  sufficient  detail  to  replicate 
them. 

By  far  the  vast  majority  of  papers  in  computational  electromagnetics  appear  in  the  IEEE  Transactions 
on  Magnetics  and  the  Journal  of  Applied  Physics  via  the  IEEE  Conference  on  Electromagnetic  Field 
Computation  (for  which  the  first  author  was  once  in  charge  in  various  capacities),  and  the 
COMPUMAG,  Intermag,  and  Magnetism  and  Magnetic  Materials  Conferences.  Regular  papers  with 
leisurely  review  and  no  page  limits  are  comparatively  few  in  these  journals.  However,  respectability 
of  results  has  become  an  issue  as  a  result.  The  papers  being  author-prepared  for  quick  turnaround, 
poorly  typed  papers  in  bad  grammar  are  not  unknown.  In  the  days  of  ribbon  printers  such  little 
attention  was  given  by  the  IEEE  to  printing  quality  in  the  rush  to  meet  deadlines,  that  some  papers 
given  in  readable  quality  for  printing  in  indexed  transactions  were  totally  unreadable  because  of 
photo-reproducing  without  proper  exposure  [39]. 

As  a  conference  is  planned,  a  publication  date  is  set  and  a  maximum  of  2  months  is  given  for  review 
for  quick  turnaround  of  a  few  hundred  papers.  As  a  result,  whereas  with  regular  journals  a  paper  is 
carefully  matched  to  reviewers  and  when  reviewers  say  they  are  not  competent  new  reviewers  are 
found;  on  this  accelerated  route,  reviewers  are  set  well  before  a  paper  is  submitted  and  the  tight 
schedule  does  not  allow  reviewers  to  be  easily  changed  (although  that  is  occasionally  done).  A  great 
disservice  appears  to  have  been  done  to  engineering  science  by  this  rush.  Thus  for  an  issue  of  one 
IEEE  Transactions  on  Magnetics  the  first  author  was  asked  to  review  3  papers  over  the  4-day 
duration  of  the  conference  in  Tokyo  without  access  to  his  library  or  facilities  for  internet  searches. 
Whatever  he  read  was  in  between  sessions  and  dinners  with  friends  after  the  day's  proceedings. 
While  that  may  be  an  extreme  example,  reviews  are  generally  inadequate.  For  example  even  when 
two  weeks  were  given  for  review  at  the  reviewer's  usual  office,  when  the  first  author  of  this  paper 
was  the  Guest  Editor  for  selecting  papers  from  such  a  conference,  a  paper  presenting  methodology 
and  equations  based  on  the  finite  element  method  had  meshes  and  results  from  the  boundary 
element  method.  In  the  hurry  the  2  reviewers  had  accepted  the  paper!  (It  was,  of  course,  rejected). 

If  a  reviewer  asks  for  a  major  change,  even  a  good  paper  is  effectively  rejected  for  lack  of  time.  So 
reviewers  are  reluctant  to  ask  for  major  changes.  If  a  section  needs  elaboration  for  repeatability  of 
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results,  reviewers  are  again  reluctant  to  ask  for  that  because  what  can  be  said  in  the  3  or  4  pages 
allowed  the  author?  This  is  how  in  magnetics  a  lot  of  partial  work  and  papers  without  attribution  to 
previous  works  have  made  it  into  indexed  journals.  In  the  worst  oddity,  a  paper  rejected  for  the 
Journal  of  Applied  Physics  from  the  MMM  Conference,  still  appears  as  a  one-page  summary  in  the 
journal  (because  it  was  presented),  giving  indexed  journal  respectability  to  a  rejected  paper.  An 
example  is  cited  as  reference  [40]. 

Getting  a  paper  is  thus  a  matter  of  funds  for  travel  and  registration  plus  the  multiple-paper 
surcharges  at  conferences,  and  then  playing  the  lottery  where  the  more  papers  one  throws  into  the 
conference,  the  better  the  chance  of  something  getting  accepted.  Typically  it  costs  $3000  to  get  a 
paper  by  this  route  (for  conference  registration,  hotel  and  subsistence,  and  travel).  As  a  result  unlike 
in  the  old  days  these  journals  give  better  access  to  those  who  can  pay,  while  the  quality  of  papers 
suffers.  The  professional  societies  standing  behind  these  prestigious  conferences  make  a  lot  of 
money. 

When  these  issues  were  raised  with  the  IEEE,  the  response  was  that  poor  authors  may  submit  to  the 
regular  issues.  But  the  continued  free  access  to  regular  papers  is  poor  compensation  given  the  2-plus 
years  that  regular  papers  can  take  and  the  higher  quality  that  regular  papers  need  for  acceptance. 
These  journals  have  argued  that  there  is  no  loss  of  quality  because  the  rate  of  acceptance  for  regular 
and  via-conference  papers  is  similar.  That  is  a  fallacious  argument  because  authors  are  very  careful 
about  what  they  submit  to  a  regular  journal  but  going  the  conference  route  they  sometimes  submit 
several  papers  knowing  the  acceptance  rate  will  guarantee  a  proportion  to  be  accepted.  It  is  like 
comparing  the  products  of  1)  a  university  that  carefully  admits  100  students  and  graduates  90%  and 
2)  another  university  admitting  10,000  students  and  graduating  90%.  The  products  cannot  be  of  the 
same  quality  even  though  both  have  a  90%  retention  rate. 

A  new  dimension  of  the  problem  as  journals  attempt  reform  (recognizing  the  problems  without 
admitting  to  them)  is  from  their  imposing  a  limit  on  the  number  of  pages  of  each  special  conference 
issue.  Previously  there  could  be  any  number  of  pages  so  long  as  the  conference  paid  for  them.  Now, 
in  a  context  where  nearly  all  reviewers  are  attendees  at  the  conference,  if  a  reviewer  accepts  a 
paper,  it  could  be  a  vote  against  his  or  her  own  paper! 

Until  these  problems  are  addressed  papers  in  magnetics  will  continue  to  suffer  in  quality. 

5.  Prospects  for  Further  Extensions 

5.1  Extensions 

In  magnetics  a  lot  of  important  work  has  been  done  in  device  shape  optimization  since  1978.  The 
purpose  of  this  section  is  to  establish  future  directions  for  harnessing  the  power  of  these  methods. 

5.2  Applications  to  NDE:  Assessment  of  Severity  of  Interior  Defects 

Hoole,  working  with  the  ENSIEG  group  on  a  Summer  assignment  just  after  their  seminal  work  in 
1989,  would  use  the  fact  that  the  ENSIEG  methods  could  be  equally  applied  to  non-destructive 
evaluation  (NDE)  [16].  That  is,  where  design  synthesis  computed  the  design  vector  {p}  to  match 
electromagnetic  fields  corresponding  to  design  goals  (which  are  the  performance  specifications),  in 
NDE  the  interior  defect  is  described  by  the  design  vector  {p}  and  the  performance  is  the  externally 
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measured  fields.  That  is  we  find  the  defect  that  would  match  exterior  measurements  through  a 
least-square  object  function  expressing  the  difference  between  the  measurements  and  the  fields 
computed  in  the  presence  of  the  defect.  Thereby  the  shape  and  location  of  the  defect  are  identified. 


These  methods  have  since  been  carried  into  applications  such  as  testing  for  cracks  in  oil  pipelines, 
underground  petrol  storage  tanks,  nuclear  reactors,  etc.  [41-44].  An  external  current  coil  is  taken 
over  the  structure  subject  to  testing  (Fig.  12).  Knowing  the  response  field  when  the  structure  is 
defect  free,  a  change  in  measured  field  is  used  to  flag  a  defect. 


Figure  12:  Problem-Specific  Parametric  Mesh  Generator 


Flowever,  when  there  is  an 
inaccessible  defect,  it  is  important  to 
know  the  severity  of  the  problem.  For 
army  ground  vehicles  hulls  with  minor 
rusting  setting  in,  the  current  testing 
methods  would  flag  a  defect  and  the 
vehicle  withdrawn  from  deployment. 
But  so  withdrawing  it  without  proper 
assessment  of  the  defect  may  be  an 
unwarranted  waste.  That  is,  the 
nature  of  the  defect  has  to  be 
assessed  to  avoid  wasteful  withdrawal 
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a.  Initial  Mesh 


b.  Elastically  Deformed  Final  Mesh 
Figure  13:  Mesh  Generator  (Limited 
Shape) 


from  service.  Having  the  response  waveform  from 
measurement,  we  postulate  a  defect  defined  by 
dimensional  parameters  pt  as  in  Fig.  12  Not  knowing  the 
actual  shape,  the  more  parameters  we  have  the  better  for 
accurate  assessment  of  defect. 

To  establish  feasibility  for  this  method,  we  need  a  special 
mesh  generator  modeling  the  crack  defined  by  parametric 
location  and  shape.  We  created  such  a  mesh  generator  for 
establishing  feasibility  as  shown  in  Figs.  12  and  13b, 
created  a  crack  and  computed  the  fields  along  measuring 
points  outside  the  steel  plate.  As  the  dimensions  {p}  of  Fig. 
12  change  during  optimization,  the  mesh  is  crunched  and 
pulled  from  the  starting  mesh  of  Fig.  13a  to  the  final 
design  of  Fig.  13b.  This  mesh  is  therefore  suitable  for 
testing  gradient  methods  as  well  as  zeroth  order  methods. 
But  it  needs  further  generalization  to  model  all  manner  of 
defects. 

Taking  the  results  as  the  "measured  field,"  we  had  to 
"discover"  the  shape  and  location  of  this  crack  as 
described  by  parameters  {p}.  The  shape  was  identified  by 
the  genetic  algorithm  as  well  as  simulated  annealing  (with 
the  former  working  better  as  discussed  below).  The  final 
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defect  is  seen  in  Fig  13b.  and  the 
magnetic  fields  in  Fig.  14.  Ultimately  this 
problem  would  need  to  be  done  in  3-D 
where  the  computational  load  would  be 
high  because  each  component  of  the  3- 
component  magnetic  vector  potential 
would  be  complex  [14]. 


Our  experience  is  that  gradient 
techniques  are  fast  in  computation  but 
slow  to  set  up  because  of  the 
programming  time  to  have  special  mesh 
generators.  Going  by  the  literature  Preis 
etal.  [45],  staunch  advocates  of  the 
zeroth  order  evolution  strategy,  merely 
say  it  is  competitive  with  its  higher  order  deterministic  counterparts  (which  we  take  to  mean  the 
same  in  time  at  best),  but  claim  its  "robustness  and  generality"  are  superior.  This  we  agree  with 
because  search  methods  will  never  see  mesh-induced  artificial  local  minima  as  a  problem.  In 
contrast  to  what  they  say,  Simkin  and  Trowbridge  [26]  aver  that  simulated  annealing  and  the 
evolution  strategy  take  many  more  function  evolutions.  This  is  also  our  experience  and  we  would 
add  that  the  genetic  algorithm  works  faster  than  simulated  annealing.  Haupt  [46]  advises  that  the 
genetic  algorithm  is  best  for  many  discrete  parameters  and  the  gradient  methods  for  where  there 
are  but  a  few  continuous  parameters.  We  have  gone  up  to  30  continuous  parameters  using  gradient 
methods  without  problems.  There  seemed  good  reasons  to  go  either  way. 


For  now  therefore  in  this  feasibility  study,  a  zeroth  order  method  like  the  genetic  algorithm  or 
simulated  annealing  would  be  best.  We  therefore  decided  to  test  both  methods,  simulated 
annealing  and  genetic  algorithm,  both  zeroth  order  methods  where  no  derivative  calculations  are 
required.  The  least  square  object  function  F  was  defined  as  the  square  of  the  difference  between 
the  exterior  fields  we  need  to  get  and  those  computed  from  the  postulated  crack.  The  fitness 
function  /  =  1/(1  +  F)  converged  to  almost  1.  The  genetic  algorithm  worked  faster  than  simulated 
annealing  as  seen  from  Tables  1  and  2.  For  the  genetic  algorithm,  a  comparable  object  function  is 
computed  from  F  =  (1  —  /)//.  It  is  seen  that  the  genetic  algorithm  reaches  a  comparable  object 
function  value  much  faster  than  simulated  annealing. 


Table  1:  Performance  of  Genetic  Algorithm 


Population 

Size 

Error  (%) 

f  Fitness 

Function 

F,  Object 
Function 

Time  (s) 

10 

6.7 

0.94xl0'4 

1067.0 

2.00 

20 

5.46 

0.0018 

554.55 

3.88 

50 

2.09 

0.9974 

0.002 

9.65 

Table  2:  Performance  of  Simulated 


Annealing 


Iterations 

F,  Object 
Function 

Time  (s) 

500 

0.0448 

14.25 

1000 

0.0146 

28.22 

4000 

0.0282 

119.12 

40000 

0.0075 

1144.42 
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5.3  Coupled  Problem  Shape  Optimization 

Coupled  field  problems  are  where  one  field 
system  with  the  shapes  to  be  optimized 
influences  another  field  system  in  which  the 
objects  of  design  are  defined.  A  good 
example  is  electro-heating  [47].  The  electrical 
system  provides  the  joule  system  that  has  to 
be  shaped  so  as  to  produce  a  particular  heat 
distribution.  Two  example  industry 
applications  are  metal  forming  where  molten 
metal  is  heated  through  heavy  currents  to 
produce  the  forces  to  make  the  molten  metal  subject  to  the  extrusion  or  turning  forces  we  want  [48, 
49].  The  application  of  the  ENSIEG  force  computation  methods  to  this  problem  is  obvious.  A  second 
example  is  hyperthermia  treatment  for  oncology  where  exterior  electrodes  attempt  to  burn  interior 
cancerous  tissue  [50]. 

The  procedure  for  coupled  problem  optimization  is  summarized  in  Fig.  15.  As  noted  we  have  already 

solved  this  by  gradient  techniques  and  chain 
rule  differentiation  [34]  using  the  same 
procedure  as  used  for  deforming  the  finite 
element  mesh  under  pin  point  forces  [31,  35]. 


In  comparing  methods  of  optimization  generally 
by  far  the  gradient  based  methods  are  the 
fastest  as  already  noted.  However  as  also  noted 
they  are  difficult  to  program  and  to  set  up 
because  of  the  requirements  on  elastic  meshes 
to  avoid  mesh-induced  local  minima  which  are 
fatal  to  gradient  optimization.  But  we  have 
successfully  optimized  the  shape  of  a  conductor 
so  that  just  above  it  along  a  straight  line  the 
temperature  is  constant  -  corresponding  to  the 
coupled  boundary  values  problem  of  Fig.  16 
showing  a  quarter  of  the  system  where  the  current  in  the  conductor,  subject  to  eddy  effects  [14], 
heats  it  thereby  producing  a  temperature  profile  around  it.  We  wish  to  have  a  constant  temperature 
along  the  line  of  measuring  points  shown  above  the  conductor.  The  object  function  then  is  the  sum 
of  the  squares  of  the  difference  between  computed  temperature  T({p})  and  the  desired  constant 
temperature  at  the  measuring  points.  To  this  end  of  optimizing  the  shape  we  have  already 
computed  the  gradients  using  chain-rule  differentiation  using  special  mesh  generators  that  avoid 
mesh-induced  minima  [30,  33].  In  our  experience,  for  coupled  problems  gradient  based  methods  are 
all  the  more  difficult  because  the  meshes  for  the  two  field  systems  are  often  different  and  the 
programming  very  problem  specific  and  tough.  Another  method  is  therefore  required  without  any 
requirement  for  gradients  -  that  is  the  genetic  algorithm  which  has  been  found  to  be  satisfactory 
except  for  needing  numerous  matrix  solutions.  In  coupled  problems  this  computational  load, 
however,  will  be  twice  as  much. 
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Figure  16:  The  Boundary  Value  Problem 


Figure  15:  Procedure  for  Coupled  Electrothermal 
Optimization 
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5.4  GPU  in  Coupled  Problem  Optimization 

We  wish  to  resort  to  the  genetic  algorithm  but  it  involves  many  function  calls  -  that  is  many  new 
meshes  and  a  finite  element  solution  for  each.  One  way  to  reduce  the  computational  load  is  to 
resort  to  parallel  computation  on  shared  memory  machines  [15,  17].  However,  shared  memory 
machines  are  expensive  and  usually  come  with  8  or  16  processors  (because  of  technical  difficulties  in 
sharing  the  memory  between  processors)  so  that  parallelization  is  limited. 

Recently  using  the  graphics  processing  unit  (GPU)  on  PCs  has  been  proposed  and  implemented  for 
finite  element  solution  [51].  The  GPUs  are  cheap  and  by  default  come  with  every  PC.  They  allow 
multiple  launches  of  a  computational  kernel  on  many  GPU  strings  in  parallel. 


However,  what  is  not  readily  recognized  -  for  example  not  mentioned  in  [46]  -  in  the  computational 
literature  although  stated  in  hardware  manuals  is  that  there  is  a  GPU  memory  limit,  presently  at  4 

GB.  For  us  the  limit  is  real  and  has  been  recently 
encountered  in  finite  element  analysis  [52,  53].  When  we 
try  to  process  several  finite  element  solutions 
simultaneously  in  parallel  we  do  hit  the  limit.  To  test  the 
limits  we  ran  a  problem  with  ever  increasing  size.  The 
results  are  presented  in  Table  3.  The  limit  was  reached 
around  a  matrix  size  of  10,000  by  10,000. 


This  limit  is  too  small  for  us  when  several  such  matrix 
equations  have  to  be  processed  simultaneously  on 
different  strings  on  a  GPU.  When  sparse  and  profile 
storage  schemes  are  employed  [14]  much  bigger  matrix 
sizes  are  possible  -  for  example  with  the  more  efficient 


Table  3:  Hitting  the  4  GB  Limit  at  Matrix 
Size  104xl04 


Size 

Storage  (in  MB) 

Normal 

Profile 

Sparse 

100 

0.04005 

0.00443 

0.00652 

400 

0.06866 

0.04127 

0.01686 

900 

3.10707 

0.12714 

0.03632 

1600 

9.79614 

0.28701 

0.07027 

2500 

23.88954 

0.54378 

0.11368 

6000 

137.44354 

1.54289 

0.27344 

8000 

244.29321 

2.66143 

0.35942 

10000 

381.66046 

4.08210 

0.45021 

sparse  storage  we  have  processed  matrices  of  size 
close  to  108xl08  before  reaching  4  GB  as  in  Table  4 
based  on  Table  3  and  additional  computations.  Curve 
fitting  was  employed  to  project  some  of  the 
information  in  Table  4  because  only  the  sparse  storage 
scheme  goes  up  to  4  GB,  necessitating  the  other 
storage  schemes  to  be  theoretical  at  matrix  sizes 
which  are  not  feasible  within  4  GB.  These  are  huge 
matrices. 


Nonetheless  in  optimization  when  we  launch  several 
strings  in  parallel  even  the  efficient  sparse  storage 
scheme  can  be  limiting  and  we  must  seek  out  new 
methods  of  handling  this  storage  and  workload. 
However,  we  note  that  the  use  of  the  sparse  scheme  of  storage  necessitates  an  iterative  method 
without  matrix  decomposition;  for  matrix  decomposition  would  require  profile  storage. 


Table  4:  Projected  Memory 


Matrix 

Size 

Regular 

(MB) 

Profile 
(in  MB) 

Sparse 

(MB) 

20000 

1525.2 

14.7265 

0.90 

30000 

3430.5 

32.1285 

1.35 

50000 

9526.5 

87.0594 

2.25 

100000 

38097.0 

341.7935 

4.51 

500000 

9.5224e+05 

8417.7 

22.58 

1000000 

3.8089e+06 

33608 

45.17 

5000000 

9.5220e+07 

8.390e+05 

225.85 

10000000 

3.8088e+08 

3.35e+06 

451.70 

50000000 

9.5220e+09 

8.39e+07 

2258.50 

100000000 

3.8088e+10 

3.35e+08 

4517.00 
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Kiss  et  al.  [52]  too  have  run  into  memory  limits  but  at  a  smaller  matrix  size  of  3,836,282  while  solving 
a  single  matrix  equation  (as  opposed  to  our  108xl08).  In  their  via-conference  IEEE  Transactions  paper 
few  details  are  given  to  permit  following.  Dziekonski  et  al.  [53]  as  we  have  noted  have  bypassed  the 
problem  by  using  multiple  GPU  clusters  which  are  not  commonly  available  and  it  is  technically 
difficult  for  most  finite  element  analysts  who  would  not  be  competent  dealing  with  hardware 
arrangements  on  their  own.  So  large  coupled  problems  with  two  field  systems  to  be  solved,  for  all 
practical  purposes,  are  beyond  the  capabilities  of  GPU  memory.  In  fact  we  found  that  we  quickly 
exceeded  the  memory  limit  and  our  program  crashed. 


To  overcome  this,  we  used  a  method  of  the  early  1980s  when,  working  with  the  then  new  IBM  PC 
286  machine,  we  had  a  memory  limit  of  612  kB  which  could  not  hold  even  a  trivial  matrix  in  memory. 
What  we  used  to  do  [54]  was  employ  the  Jacobi  methods  of  matrix  solution  (power  systems 
engineers  call  it  Gauss-Seidel)  in  a  modified  form.  For  example  in  solving  (3),  [P]{<p}={Q},  the  Gauss- 
Seidel  iterations  commonly  used  by  power  engineers,  is  an  improvement  on  the  older  Gauss 
iterations.  In  Gauss-Seidel  in  each  iteration  m+1  we  use  the  latest  available  values  of  the  unknowns 
cp,  using  equation  i  of  (3)  to  compute  cpt  treating  only  cpt  as  the  unknown  and  all  the  other  variables 
as  known  and  given  by  their  latest  values  in  the  iteration  cycle: 


<Pi 


m+l 


(i-1  n 

^ Pik<pk+1  +  ^  Pik<pk 

k= 1  ki+l 


(13) 


with  obvious  modifications  for  i  =1  and  i  =  n.  In  this  algorithm  cpi-^  must  be  computed  before 
(pi.  Here  at  iteration  m+l,  computing  cpt  in  the  order  i=l  to  n,  cp  is  at  values  of  iteration  m+l  up  to 
the  (i-l)th  component  of  { cp  }  and  at  the  value  of  the  previous  iteration  m  for  values  after  i.  It  is 
therefore  necessarily  a  sequential  algorithm.  The  older  displaced  Gauss  iterations  uses  the  old 
iteration's  value  for  computing  all  cpt  in  iteration  m+l.  Therefore  the  computation  of  a  particular 
Rvalue  is  independent  of  the  computation  of  all  other  cpt  values  and  therefore  parallelizable: 

(i-i  n  ^ 

Qi-^PikVk  -  ^  Pik<Pk  |  (14) 

k= 1  k=i+ 1  ) 


This  is  inefficient  in  the  context  of  sequential  computations.  But  in  this  case  of  parallelization  as 
several  strings  on  the  GPU,  it  is  highly  efficient.  But  the  problem  of  memory  needs  to  be  addressed 
as  each  string  must  carry  [P].  We  may  address  this  by  not  forming  the  matrix  [P].  If  [D]  is  the  matrix 
[P]  with  all  off  diagonal  elements  eliminated,  then  Gauss's  iterations  in  this  modified  form  gives, 

[D]{A}m+1  =  {Q}-[P-D]{Ar  (15) 

Thus  without  forming  [P],  the  operation  of  the  right  hand  side  of  (15)  can  be  effected  by  taking  each 
finite  element  in  turn,  computing  the  local  3x3  Dirichlet  matrix  [P]L  and  using  that  because 

[P]  =  ZElements[P]L  (16) 
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So  as  each  [P]L  is  formed  in  each  string,  the  three  values  of  {cp}m  may  be  taken  and  subtracted  as  in 
the  right  hand  side  of  (14).  We  tested  this  and  the  results  are  shown  in  Fig.  17.  We  were  able  to  go 
up  to  matrix  sizes  100,000x100,000,  speed-up  computation  not  being  possible  for  higher  sizes  with 
normal  storage.  We  project  being  able  to  go  well-beyond  to  100,000,000x100,000,000  without 
hitting  memory  limits. 

We  then  applied  the  same  idea  to  the  more  efficient  Incomplete  Cholesky  Preconditioned  Conjugate 
gradients  (ICCG)  method  [14]  as  laid  out  by  Mahinthakumar  and  Hoole  [55]  and  Carey  et  al.  [56]  for 
shared  memory  systems  with  very  similar  speedup  of  about  130  -  impossible  on  a  shared  memory 
system  where  with  16  processors  speedup  will  be  below  15,  accounting  for  one  processor  for 
coordination  of  the  other  15  and  time  for  exchange  of  information  between  processors 

In  this  work,  the  Incomplete  Cholesky  Conjugate  Gradients  matrix  solver  was  parallelized  on  the  GPU 
and  we  observed  a  speed-up  of  146.351  for  the  matrix  size  10,000x10,000  (Fig.  17).  The  shaped 
conductor  is  shown  in  Fig.  18.  In  the  GA  kernel,  implemented  here  there  was  no  internal 
parallelization  -  that  is  no  parallelization  of  the  matrix  computation  routines  within  the  genetic 
algorithm  such  as  of  matrix  computations.  This  could  have  been  done  for  even  greater  gain. 

In  some  ways  element-by-element  work  this  is  like  what  Kiss  et  a/.[52]  have  done  but  only  with  ICCG. 
While  they  applied  it  to  the  solution  of  one  matrix  equation,  we  have  solved  an  optimization 
problem  on  a  GPU.  We  have  applied  that  to  Gauss-Seidel  with  multiple  matrix  equation  solutions  for 
optimization.  Whereas  Dziekonski  et  al.  [53]  have  encountered  memory  limits  working  with  a  single 
matrix  equation  of  3,836,282  degrees  of  freedom  by  their  element  by  element  method,  we  are  able 
to  go  up  to  100,000,000  degrees  of  freedom  without  hitting  any  limit.  At  least  for  what  was 
compared,  we  can  run  100/3.862  or  about  26  parallel  optimizations  streams  for  problems  as  big  as 
that,  and  numerous  times  more  for  smaller  everyday  problems. 

This  number  26  is  well  above  the  number  of  parameters  being  usually  optimized  for  designs  so  that 
such  approaches  are  feasible  for  genetic  algorithm  parallelization  on  GPUs. 
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Conclusions 

This  paper  has  reviewed  finite  element  optimization  in  magnetics  and  its  natural  extension  to 
nondestructive  evaluation.  Much  good  work  has  been  done  in  the  subject  area  of  magnetics  but 
because  of  the  culture  of  the  journals  in  which  this  work  is  published,  attribution  to  previous  works 
is  poor  and  the  emphasis  is  on  results  rather  than  repeatable  methodology.  Perhaps  with  good  but 
misguided  intentions  of  fast  dissemination  of  results,  the  subject  area  of  finite  element  field 
computation  in  magnetics  appears  to  have  invited  an  insidious  culture  of  quick  and  easy  publication 
that  is  harmful  to  the  subject.  In  the  long  term  this  will  be  seen  as  invidious  by  colleagues  from  other 
areas  of  science  and  engineering  who  compete  for  the  same  awards,  recognitions,  and  promotions. 

The  review  of  the  work  in  finite  element  optimization  in  magnetics  has  identified  the  original  papers 
as  coming  from  French  scientists.  This  paper  has  pointed  out  future  directions  of  this  subject  in 
coupled  field  problem  optimization  with  the  genetic  algorithm,  the  development  of  mesh  generators 
based  on  parametric  description  and  the  use  of  GPUs  to  accelerate  the  process  through  modified 
algorithms.  The  mesh  generator  developed  conforms  to  the  requirements  for  smooth  object 
functions  so  that  future  work  can  compare  zeroth  and  first  order  methods  in  magnetics. 

The  genetic  algorithm  used  in  optimization  requires  numerous  function  evaluations.  In  coupled 
problems  this  is  doubled.  So  the  use  of  GPUs  is  critical  to  parallelize  the  computations  and  speed 
them  up.  But  that  also  doubles  the  memory  requirements  on  GPUs  which  suffer  severe  memory 
limits.  Others  have  used  element-by-element  processing  for  matrix  solution  processing  to  reduce 
memory  loads.  But  they  too  have  run  up  against  these  memory  limits.  To  overcome  these,  the  use  of 
clusters  of  GPUs  has  been  reported  but  this  is  not  satisfactory  because  of  their  not  being  widely 
available.  We  have  been  able  to  solve  much  larger  problems  than  reported  on  a  single  GPU  using 
sparse  storage,  without  recourse  to  clusters  by  element-by-element  processing  with  Jacobi's  method 
which  takes  less  memory  than  ICCG.. 

Significant  speed  up  of  146  has  been  shown  by  using  element-by-element  iterative  approaches  to 
finite  element  matrix  solution.  The  memory  savings  achieved  without  multicore  systems  offers 
promise  for  running  several  strings  of  genetic  algorithm  kernels  without  hitting  memory  limits.  It  is 
also  possible  to  parallelize  each  string  into  several  parts  parallelizing  the  matrix  solution  scheme  in  a 
kernel  because  of  the  availability  of  memory  freed  up  by  the  element-by-element  approach.  Some 
preliminary  work  has  been  presented  to  establish  the  feasibility  of  these  suggestions  with  26  design 
variables. 

Future  directions  must  engage  more  general  mesh  generators  to  model  any  shape  of  crack  and  three 
dimensional  analysis  of  eddy  current  fields  to  locate  and  characterize  interior  defects. 

Disclaimer 

Reference  herein  to  any  specific  commercial  company,  product,  process,  or  service  by  trade  name, 
trademark,  manufacturer,  or  otherwise  does  not  necessarily  constitute  or  imply  its  endorsement, 
recommendation,  or  favoring  by  the  United  States  Government  or  the  Dept,  of  the  Army  (DoA).  The 
opinions  of  the  authors  expressed  herein  do  not  necessarily  state  or  reflect  those  of  the  United 
States  Government  or  the  DoD,  and  shall  not  be  used  for  advertising  or  product  endorsement 
purposes. 
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